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Abstract 

This paper presents an approach to modeling progressive event-history 
data when the overall objective is prediction based on time-dependent co- 
variates. This approach does not model the hazard function directly. Instead, 
it models the process of the state indicators of the event history so that the 
time-dependent covariates can be incorporated and predictors of the future 
events easily formulated. Our model can be applied to a range of real-world 
problems in medical and agricultural science. 



1 Introduction 

This paper presents a new theory for event history processes that involve a se- 
quence of irreversible, progressive events and associated external time-dependent 
covariates, i.e. covariates not influenced by the occurrence of the events of central 
interest ( Kalbfleisch and Prentice 2002). These covariates are known up to times on 
a discrete scale, say daily scale, for example. Such events signal changing condi- 
tions, which may point to the need for strategic actions that reduce risk associated 
with those processes, for example cancer progression or survival of wine-grape 
perennial crops. By modeling how such event sequences change in relation to 
time-dependent covariates, useful information may be provided to those involved 
in assessing best therapeutic intervention responses, or environmental impacts. 

General event-history data have been well studied. As examples: Weiss and Ze- 
len (1965) and Lagakos et al. (1978) considered semi-Markov models; Hougaard 
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(2000) described a broad range of Markov models; Cook and Lawless (2007 ) pre- 
sented two broad approaches for recurrent event data: modeling the counts of 
events in a time interval and modeling the gap time between two events; Aalen 
et al. (2008) described approaches based on counting processes. Many of these 
approaches can be used to analyze progressive events data. However, when a time- 
dependent covariate is present, the problem becomes thorny, especially if the main 
objective of the analysis is prediction. 

When a single event is under consideration and a time-dependent covariate is 
present, the usual practice is to apply the Cox model (Cox, 1972) or a paramet- 
ric proportional hazards model (Collett, 2003). The advantage of the Cox model 
is that if the hazard function is only related to the covariate evaluated at the cur- 
rent time, then we can plug that covariate value into an expression for a partial 
likelihood function, regardless of the values of the covariate at other time points. 
However, this causes a loss of efficiency, since the information contained in the 
covariate between the gap times of events are not used. Cox (1972) argued that 
the loss of efficiency is not much unless either:(l) the model parameter is far from 
zero; (2) censoring is strongly dependent on covariates; or (3) there are strong time 
trends in the covariates. While the first two issues may not concern us, the third 
is crucial for phenological data since the associated climate variables usually have 
strong seasonality (and will exhibit as a dominant local trend between event-times 
within a season). On the other hand, the Cox model is not suitable for prediction, 
since it does not extrapolate beyond the last observation. A parametric proportional 
hazards model might be a good choice for prediction. But it requires explicit dis- 
tributional assumptions for the time-to-event, which may be mis-specified. Also, 
if the hazard ratio is related to the covariate evaluated at several time points at 
and prior to the current time, the likelihood function may involve a complicated 
integration. 

When multiple events are of interest, to deal with time-dependent covariates, 
the usual approach is to apply a Cox model for each event where time-dependent 
covariates are present (e.g. Hougaard, 1999), or use a parametric model to model 
the hazard rate and to incorporate the covariates just as in the parametric propor- 
tional hazards model (e.g. Cook and Lawless, 2007). These approaches induce 
similar problems to those in the single event case. 

In this paper, we introduce an approach based on modeling the process of event 
state indicator. In this approach, all the available information contained in time- 
dependent covariates can be easily incorporated in the likelihood function, and the 
construction of a predictor is straightforward. Also, this approach does not impose 
strong distributional assumption on times to events. 

The paper is organized as follows. Section 2 presents a model for a single 
event. There our basic assumptions are introduced and estimation and prediction 
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procedures for the model are described. Also, the estimation for the case of non- 
informative right censored response is considered. Section 3 presents a model for 
sequential events, which is an extension of our model for single event but with a 
few additional assumptions. In section 4, we test our model for single event by 
applying it to the blooming event of pear trees. There a cross validation procedure 
is used to evaluate our prediction of future events. The uncertainty associated with 
the prediction is also assessed. The final section summarizes our methods, and 
gives pointers to possible future work. 

2 Model for a single event 

This section concerns the case of a single phenological outcome called an "event", 
for example "death". The data consist of the times to the occurrence of that out- 
come for N experimental subjects, i = 1, • • • , N. 

2.1 Basic setup 

In the sequel, upper case letters denote random variables and lower case ones, their 
realized values. 

We adopt the following assumption in this section: 

Assumptions 1. Only one event can occur for each individual, and once it has 
occurred, it remains in the "occurred" state thereafter. 

We assume a discrete time scale with a well-defined origin to, that we take to 
be to = without loss of generality. For individual i, let Tj denote the random 
time to occurrence of the event. At each time point t = 0, 1, • • • , classify the state 
of the event for each individual as "occurred" or "not occurred". At time t, let Y^t 
denote this state, being 1 or according as the event has "occurred" or not. Then, 
time to event Tj and state indicator Y; h t have the following relationship: 

Yi, = 0, Y iA = 0, • ■ ■ , 1^,(21-1) = 0, Y itT . = 1, y i)(Ti +i) = !,-••, (1) 

where Yj ; t is 1 for all t > Tj by Assumptions 1 . 

Associated with each individual is a time-dependent covariate vector, which 
is observed on the same discrete time scale. Denote its value at time t (t = 
• • • , —1, 0, 1, • • • ) by Xi : f Note that a fixed covariate is a special time-dependent 
one and so is subsumed by our theory. For individual i, we further denote the co- 
variate process evaluated at all time points, i.e. {• • • X^-i = Xj, -i, -X^o = 
%i,o, = Xi t i, ■ ■ ■ }, as X^t'ez- Similarly, we write ij,o:t as the set of state 
indicators Y^t evaluated from time origin to time point t (t = 0, 1, ■ ■ ■ ), i.e. 
{Yi,o = yi,o, ■■■ ,Y itt = yi,t}- 
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2.2 Probability model 

The conditional probability distribution of o : * given t > e z is 

t 

P(Fi,o:t \X ijt > eZ ) = P(^i,o = Vi,o \X ijt >ez) JJp(^, s = y*,* |*i,o : ( s -i), -^i.t'ez) , 

s=l 

(2) 

where P (•) is the probability set function. This expression can be simplified using 
the following result: 

Proposition 1. For each individual i and single event in Assumption 1. conditional 
on X^t'ez, the stochastic process {y,t : t = 0, 1, • • •} is a first order Markov 
chain, i.e. 

P (Yi,t = Vi,t |*i,0:(t-l)> X i,t'ez) = P ( Y i,t = Vi,t = Vi,(t-1), X i,t'&) , 

(3) 

for allt = 1, 2, • • • aw/ yj )t G {0, 1}. 

Proof. In the proof everything will be conditional on X ijt i e % and is omitted for 
simplicity. Since for each individual i and all t = 0, 1, ■ • • , Y^t is a binary 
random variable, it suffices to separately consider only two cases, Y^ u_i\ = and 
Y i,(t-i) = !• Firstly, Yj i(t _i) = implies that Y iy0 = 0, • • • , and Y i){t _ 2) = 0, 
making {l^o = 0, ■ ■ ■ ,Y it t t _ 2 \ = 0, Y it = 0} the only possible probability 
event for Y i .u_i\ and thus equivalent to {Y^ ( t _;n = 0}. 

Secondly, for the type of single event under consideration, if for some t' > 0, 
= 1, then y ii(t _ 1} = 1 for all t > t' . Thus, when r i)(t _ 1} = 1 (t > 0), 

we have 

Pfct = I/i,* |*i,0:(t-l)) = P ( y M = i/M |*i,0:(f-2), (t-1) = l) = 1, (4) 

□ 

which completes the proof.* 
Equation (2) then simplifies to 

t 

P (Y l:0 . t \X ijt , eZl ) = P (y ij0 = yi,o l^i.t'ez) n p ( y *' s = = -^i.t'ez) • 

s=l 

(5) 
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For individual i,fhe previous equation and ( 1 ) imply that the conditional probability 
that the event occurs at time ti, given all the covariate values Xj jt / e z is 



p {Ti = n 



Xi,fez) =P(no = 0, Fi,i = 0, ••• , y i)(ti _i) =o, y Mi = l|X i)t / eZ ) 



r*i— i 



= p(Y i , = o|x M , 6Z ) • Jlp(y i , a = o|y i)(a _ 1) = o, x iit , ez ) 



-S=l 



p(y iiti = i|y ii(ti _ 1) = o, x i>t / ez ) . (6) 



Now we are ready to build a regression model based on this probability model. 
2.3 Regression model 

Assume that the occurrences of the events of different individuals are indepen- 
dent realizations from the same population. We require additional assumptions 
about the relationship of the occurrence of the event and covariate to limit the to- 
tal number of parameters. In Equation (6), the probability of an event occurring 
at time U is conditioned on covariate values evaluated at all discrete time points 
• • • , —1, 0, 1, • • • . In real applications, the occurrence of an event usually only 
depends on the covariate values at and prior to the occurrence time. Furthermore, 
in some situations, we may assume that at a time point t > 0, the state of the event 
mainly depends on the covariate values at the current and several previous times, 
or some weighted average of them. In practice, we want to make some reasonable 
assumptions so that the total number of covariates (and consequently the total num- 
ber of parameters in the regression model) is limited, and the number of covariates 
does not change over time. 

For the purpose of illustration, simply assume that for individual i at time point 
t, the state indicator t is only related to the covariate values evaluated from time 
t — K tot, i.e. { (t—K) i " " " ) ^i,t}, where K is constant. Now, for individual 
i at time t, no matter if X^t is a vector or not, the total number of covariate values 
that are related to Y^t is finite and fixed, and we will put them together as a vector 
denoted by X. ht - Then in Equation (6), term X ijt > e % on the right hand side (RHS) 
can be replaced by j. 

We further assume that time origin is the earliest time an event can occur, 
otherwise the data is not useful for studying the probability of the occurrence of 
the event. Then we have 



By virtue of the Markov property of {Y^t :t = 0, 1, • • • }, for modeling P (Tj = ti |Xj t t'ez)> 
it suffices to model P (Y^t = yi,t \Yi,(t-i) = 0, Xi t t) for t = 0, • • • , U and 



P(^,o = j/i,o|*i,o) =P(y,o = y i ,o\Y i - 1 =0, X i>0 ) . 



(V) 
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Vi,t G {0, 1}. Write 



p i> t = p(y i , t = i|r i ,( t _i) = o, * M ) . (8) 

Then since y^t G {0, 1}, we have 

P(*M = yi,t|n(t-i) = °> <*m) = p m (i-Pm) 1_!/m • (9) 

For each fixed individual i, P^ t is a function of t and t - Now, to build a regres- 
sion model, we will choose a useful explicit form for this function with unknown 
parameters, and carry out statistical inference for these parameters. 

If we want to restrict the functional form of P^t to a linear function of the 
parameters, then for individual i, at each time point t = 0, • • • , U, we may 
consider a linear regression model for binary events. Consider a monotonic link 
function g : (0, 1) — > (— oo, oo) (e.g., g could be the logit or probit function). We 
assume that g (Pj,t) equals a linear function of the covariate vector Xi^ t , i.e. 

g(P i , t ) = /3fX i , t , (10) 

where (3 t is a parameter vector which remains the same across different individuals 
i, but may vary with time t. The superscript T stands for the transpose of a vector 
or a matrix. 

By Equation (6) - ( 10), we have 

p(Ti = u \Xi,t>&) = g- 1 (P?K,u) n i 1 ~ a" 1 (PjXi,*)) , (ID 

where g~ l is the inverse function of g. To achieve computational tractability, we 
take fit to be a constant vector over time, so the subscript t of (3t in the above equa- 
tion can be omitted. Under the independence assumption, the likelihood function 
of the data is 



N 



i=i 



u-i 



a' 1 (A,*,) II {i-g- 1 {f^s)) 



s=0 



(12) 



One can now proceed with maximum likelihood (ML) or Bayesian methods to 
estimate parameters. 



2.4 Non-informative right censoring 

If the event has not occurred for an individual by the end of the study or an individ- 
ual left the study before the event occurs, we get a right-censored observation. In 
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this paper, we consider non-informative right censoring, i.e. the time to the event is 
independent of the censoring mechanism. The methodology is derived from Collett 
(2003). 

Here, when writing the conditional probability of an event occurring at some 
time point given covariate values Xi^t'ez, we will omit the conditioning variable. 
All the probability expressions in this section are then conditioned on X± t 'ei- 

For each individual i = 1, • • • , N, we have an observed time ti, which is 
either an event-time, or a right censoring time. We denote this observation as a 
random variable n. Then the value of n is U. Now, for individual i, let 5% be 
an indicator which takes values 1 or 0, according as we observe the event or not 
because it is right censored. By the non-informative censoring assumption, we can 
assume that each individual i is associated with two independent random variables: 
event time Tj and censoring time Cj. If the observation for individual % is censored, 
we have 

Q < Ti and n = d, when 5 t = , (13) 

otherwise, we have 

d > Ti and n = T h when 5* = 1 . (14) 

Now, it is easy to see that n = min (T, Q), and 

P( n = t,5i = 0) =P(Ci = t, Ti >t)=P(d = t)P(T t > t) , (15) 

where the second equality holds because of the non-informative censoring assump- 
tion. Similarly, we have 



P(7i = t,Si = 1) = P(Ti = t, d >t)=P(Ti = t)P(d > t) 
The likelihood function for the observations t\, ■ ■ ■ , then is 



(16) 



N 

L = Y[P(n = u, Si) 

i=i 

N 



1-Si 



l[(P(C i = t i )P(T i >U)) 

N 

\[V{C i = t i ) 1 - 5i V{C i >t i ) 



i=l 



(P(T i = t i )P(C i >t i )) di 

■ N 

.1=1 



l-8i 



(17) 



By the non-informative censoring assumption, term fliLi P (C» = ^i) 1 &i P (C« > U 
does not involve parameters that are related to the distribution of event-time T. 
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Therefore, to find the maximum likelihood estimator (MLE) of the model parame- 
ters, it suffices to maximize the following function 



A? 



L' (f3) = l[P(T i = t l fp(T l >t l 



l-Si 



1=1 



and 



$mle = ArgmaxL' (f3) 



(18) 



(19) 



Term P (Tj = ti) in Equation ( 18 ) is given by Equation (11) (note that the con- 
ditioning variable t r e % has been omitted in the current expressions), while term 
P (Ti > ti) can be calculated as follows: 



P(T t >ti)=P(Y h0 = 0, Y iA = 0, ••• , Y hti = 0) = H(l-g- 1 {(3 T X hS )) . 

s=0 

(20) 

Then we can re-write Equation ( 1 8 ) as 



N 



l' 09) = n 



i=i 



k-l 



s=0 



6, 



1-5; 



nci-ff- 1 ^,.)) 

(21) " 



s=0 



Now, we can easily estimate (3 using Equation (19) if it is assumed constant over 
time. 



2.5 Prediction 

To use the regression model to predict the time to a future event, we must know 
the future values of time-dependent covariates in advance. However, generally we 
will not know them and hence must predict them. We therefore assume that their 
predictive distribution is available in order to make progress on this problem. 

With that understanding and time origin 0, suppose the current time is t c > 0. 
For a new individual, one whose data were not used for parameter estimation and 
whose event-time is unknown, suppose the event has not occurred up to t c . This 
subsection presents a predictor for the event-time of this individual, denoted by T*, 
with corresponding state indicator Y t * at time t > 0. 

To construct that predictor, we denote the covariate vector for this individual 
evaluated at time t as X%. Similarly we write the "new individual version" of Xi :t 
as X£. Since we know the covariate values up to time t c , X% may be decomposed 
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into two vectors: one vector X£ obs consists of covariates values evaluated from 
time to time t c , which we observed exactly, and the other, X£ wed , consists of 
predicted covariates values from t c + 1 to t, whose predictive distributions are 
given by another model. Furthermore, denote the estimated parameter vector as 
(3, and the covariates and state indicator used to estimate /3 as X tram and Y tram 
respectively. 

If we knew the true value of f3, Equation (11) would imply the predictive dis- 
tribution of bloom time T* . In other words the probability of the event occurring 
at time t c + K for any K > 1 given X^ obs , the observed covariate values for the 
new individual, would be 

(T* = t c + K\Xl +K>obs ) 

= l ^ =tc + K \ X t c +K,obsi X t c +K,pred) ^ {^t c +K,pred) 
K-l 

s=l 

We attach a subscript /3 on that probability function to emphasize we are using 
the true parameter values. The problem is that we do not know the true (3. So 
we replace j3 by /3 in Equation (22) to estimate the predictive distribution of the 
event- time T* as: 

P /3 ( T * =tc + K \X? c+ k, obs ) 
. K-l 
= / g~ l (P T K +K ) Hi 1 - 9^ (P T K+)) dP (K+K, P red) ■ (23) 

J 8=1 

If the predictive distribution of X^ +K pred is given by another model, the integral 
in this equation may be calculated by the Monte Carlo (MC) algorithm. Generate a 
sample of large size L from the distribution of X^ +K pred , and denote the sample 
points as X^ +K pred (/)(/ = 1, • • • , L). Then we may approximate the predictive 
probabilities by 

1 L 

P /3 ( T * =t c + K\Xt,obs) ~ Y ( T * =t c + K \X? c+ K,obsi X t c +K,pred (0 ) ■ 

1=1 

(24) 

This "plug-in" approach for predictive distribution is generally criticized as 
failing to take into account the uncertainty of the unknown parameter. But, if one 
takes the Bayesian approach, the uncertainty of the unknown parameter is incorpo- 
rated in a natural way. Suppose in an estimation procedure, one takes the Bayesian 
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approach and gets P (f3\X train , Y train ), the posterior distribution of p. Then the 
predictive distribution of T* is: 

P (T* = t c + K \X t * c+K , obs , X train , Y tram ) 

/ / r> (rp* / i I v* V* ft Ytrain \rtrain\ 

~ i \ V \ L — t c + i^ \*t c +K,obsi *t c +K,predi P> A > Y ) 

dP ((3\X train , Y tram ) dP {X t * c+K , pred ) . (25) 

One may expect that the Bayesian approach will in general be superior to the "plug- 
in" approach in terms of prediction. However, Smith ( 1998 ) showed that for many 
models, when assessed from the point of view of mean squared error of predictive 
probabilities, the "plug-in" approach is better than the Bayesian approach in the 
extreme tail of the distribution. It is not directly clear if this argument fits our 
model, but the point here is that we think both approaches make sense. 

3 Model for multiple events 
3.1 Basic setup 

Suppose there are N individuals, and 5 > 1 different events may occur to each 
individual. We make the following assumption: 

Assumptions 2. For each individual, the 5 events have the following properties: 

1 . They occur in a fixed time order; 

2. For an event to occur, all the events prior to it must have occurred. 

3. For a fixed individual, no two different events occur at the same time point. 

By these assumptions, we can label each event by the time order in which it 
occurs, using the symbol s = 1, , ■ ■ ■ , S. When we talk about the occurrence of the 
s th event, all the previous events from the I s * to the (s — l) st must have occurred. 

Now, for an individual i, there are 5+1 states: no events have occurred, the first 
event has occurred but the second hasn't and so on to the last event has occurred, 
i.e. all 5 events have occurred. We will denote these states by 0, 1, • • • , 5, 
respectively. For the i th individual, we will denote the random variable for the 
time to the s th event as Tj )S , and denote its value as U :S , s = 1, • • • , 5. We 
also create a state indicator Y^j with Y; h t = I £ {0, 1, • • • , 5} indicating that 
the individual i is in the I th state. For the i th individual, starting from the time 
origin 0, we consider discrete time points 0, 1, • • • , i, • • ■ , U t 2, ■ ■ ■ , U t s- The 
time origin satisfies < t^i. The value of Y^t can only be I or I + 1 when 
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Y ift _ x = I G {0, 1, • • • , S - 1}. Also, y i>t = 5 for all t > U,S- Then, the 
event-times {T^ s } and state indicators {y,t} have the following relationship: 

y, = o, • • • , y ijtiil = i, y^t^+i) = i, ■ •• , ii,(t ijS _i) = 5-1, 

ii,ti, s = 5 1 , ii,( tiiS +i) =S, ■■■ (26) 
Furthermore, assume that at each discrete time point, we observe a covariate vector 

With the above notation, we will continue to let U: t denote {y , o = yi, o 5 ■ ■ ■ , Y% , t = 
y i>t }, and X ijt > eZ denote {•• • X ij _i = x ij _i, X ij0 = x i>0 , X i;1 = x itl , •••}, 
as we did above for a single event. 

3.2 Probability model 

For multiple progressive events that satisfy Assumptions 2 and each individual 
i, the conditional probability of y,o:t given X^t'ez still satisfies Equation (2). 
However, the stochastic process {Y^t :t = 0, 1, • • • } is no longer necessarily a 
first-order Markov chain. Instead, the following result holds with 1 = 1, ■ ■ ■ , S — 
1: 

P (y,t = Ui,t \Y ii0: (t-i), X ijt i eZ ) 

P{Y i ,t = yi,t\Y i , {t - 1) =0, X itVeZ ), if < t<t itl 

P (ii,t = Vi,t \Y it (t-i) = I, Tj ; i = ti, i, • • • , Tij = Uj, X ijt i eIl ) , HUj <t < ij^j+i) 
1, if* ii s<*. 

(27) 

This result and Equation (2) implies that for each individual i, 

P (Ti, i = tj, i, Tj i2 =U,2, • • • , T i: s = U t s \X ijt i e z) 

U, i-i 

p(y <)ti>1 = i|r i)(tiil _ 1) = o, x i)t / ez ) [] p(y i)t = o|y i)(t _ 1) = o, x i;t , eZ ) 



t=0 



5-1 



k Z=l L 

*», (i+l) -1 

*=*i,i+i 



(28) 
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With I = 0, 1, ■ ■ ■ , S - 1 we write 



p m = / Pfc = l|n(*-i) = °> *i,fez), ifi = 

*' fW "l P(y M = Z + l|y,, ( t -i) = /, T iA = t hl , ■■■ , T it i = U,u Xi,t>ez), if / > 0. 

(29) 

Since conditional on Yj ( t _ 1 - ) = Z, y ; t can only take values I or I + 1, once we get 
a model for Pj ;t (Z) for Z = 0, ■ ■ ■ , S — 1, we can model every term in Equation 
(28). 

Compared with the expression of P (Tj = U \Xi : f e z) for a single event (Equa- 
tion (6)), Equation (28) is much more complicated. In the single event case, the 
Markov property implies that, to model P (Ti = U \Xi tt >£i,) for each individual i, 
it suffices to model the conditional probability P (Y^t = yi^ \ = 0, X^ t ), 

which is a function of only t and X- h t- However, now we need to model P^t (I) 
for I = 0, • • • , S — 1, which is a function of not only t and f e z, but also 
of t^i, • • • , tij, and event state I. To simplify this probability model, We need 
to make extra assumptions on the dependences among different events. A simple 
way is to assume that {y t : t = 0, 1, • • • } is a Markov chain, and then we can 
proceed just like the case of single event. However, this assumption may be too 
restrictive in many cases. Below, we will provide an alternative approach based on 
other assumptions. 



3.3 Regression model 

Assume as above that y 5 1 only depends on covariate values evaluated at a finite 
number of time points, X^^. All the X ijt i e % terms in Equation (29) and on the 
RHS of Equation (28) can then be replaced by Xi :t . Also, we write {y,o = 
Vi,o, ■■■ , Yi,t = Vi,t}, t = 0, 1, • • • , as Y ij0:t . 

In the Equation (29) for P M (I), T iA , ■■■ , T ijh I = 1, • • • , S - 1 and 
covariate vector Xi :t are all conditioning variables. Let us treat t^i, • • • , Uj as 
time-dependent covariates, and assume an explicit form (with unknown parame- 
ters) for Pj ; t (I) as a function of i , • • • , Uj and Xi t t- 

For example, let g : (0, 1) — > (—00,00) be a monotonic link function, and 
assume g (P^t (I)) is a linear function or a polynomial of i^i, • • • , tij and Af^t. 
For different 1 = 0, • • • , 5 — 1, the numbers of conditioning event- times in the 
expression of P^ t (I) are different. We use a trick to make the number of covariates 
constant over time so that our mathematical expressions can be simply formulated. 
For each individual i, we define the following time dependent covariates: 
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Now for every 1 = 0, ■ ■ ■ , S — 1, (I) is a function of T[ l5 • • • , T' 
Ai t, / and t. If we assume g (P^t (/)) is a linear function of T/ l5 • • • , T' 
and Xi jt , we can define a covariate vector 

Z i>t =(xT t , ^,i(t), ^, s _i(t)) T • (31) 

Similarly, if we assume g (P^t (Z)) to be a polynomial function of them, we can 
define t as a vector which consists of the terms of the polynomial. Under both 
assumptions, we can write 

g (P M (J)) = PliZ^u for I = 0, ■ ■ • , S - 1 , (32) 

where /3 t: ; is a parameter vector that varies with time t and event state Z but remains 
the same across different individuals. In many situations, we may reasonably as- 
sume fit, i is constant over time. Then it is a function of only Z, and we will write it 

as fr. 

Now, if all N individuals are independent, and for each individual, we observe 
all the S events (i.e. no censoring), then the likelihood function is 

N 

L (A) = JJP {Ti,i = Tj j2 = ii,2, ■ ■ ■ , ^i.s = i?,s \Xi ;t 'ez) 
i=i 

AT r ti.i- 1 

=n ^(^m) n (i-^^m))- 

i=l l t=0 

S-l r *«, -I "J 

n s- 1 ^,*) n (i - 9- 1 (a 1 xo) ^) 

/=i l *=*i,i+i - 1 j 

In the above model, we are making an explicit assumption on the conditional dis- 
tribution of Yij given all the previous events times. By successive conditioning, we 
actually are implicitly making an assumption about the joint distribution of all the 
S event-times Tj i, • • • , T^s (see Equation (28)). Sometimes, this assumption 
may be not easy to verify. On the other hand, even if fii is constant over Z, there are 
5 — 1 more covariates than the single event case. When S is large compared to N, 
the estimates of parameters will have large standard errors. 

3.4 Estimation and prediction 

We consider the model defined by Equation (30) - (32). When there is no cen- 
soring, the likelihood function is given by Equation (33). We now turn to the 
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case where the responses are non-informatively right censored. First, we may as- 
sume Tj 5 o = 0. Note that we have assumed T^j / T^i for / / V and 1,1' = 
1, 2, • • • , S 1 in Section 3.1 However, it is possible that T^o = = T^\. For 
each individual i, we observe several times, the last one being the event-time for 
the last event or censored time, and the previous times as the event-times prior to 
the last observation. If the last observed time is the time to the last event, there is 
no censoring; otherwise the observation is right censored. 

Denote the last observed time by the random variable T{ and its value by ij, 
where the censoring time is generated by a random variable C%. Suppose for each 
individual i, prior to the last observed time ti, we observe Ki G {0, 1, . . . , S} 
events. Without censoring Ki = S — 1, = tj = Tj g and > Tj s; otherwise, 
i~i = U = Ci and ti : K { < Ci < Tj^+i- Just as before, we define a censoring 
indicator Si, which takes values or 1 according as the last observation is censored 
or not. Then we can easily show that, under the non-informative right censoring 
assumption, the MLE equals the parameter value that maximizes the following 
function, 

N f s> 
i=i 

P {Ti,i = *i, i , • • • , T i)Ki = U t Ki, Ti,Ki+i > U jXj^/g^) 1 S ' I . 

(34) 

The first factor on the RHS of the above equation is 1 when <5j = 0, and when 
Si = 1, it is given by Equation (28). When 5i = 1, the second factor is 1 and when 
Si = 0, it is 

P {Ti,i = U : i, • • • , T iiKi = ti t Ki, T it K i+ i > ti |X ijt / 6Z ) 

Ki-l 



II P { Y hti,«+i) = l + l Y i,(t li(l+1) -1) = l ' T iA = • . T iJ = U,h x itt 
1=0 L 

I~] P (Yi,t = I |*i,(t-i) = I, Ti,i =U,i, ••• , Tij = tij, X iit > e z) 



U 

[ | P {Yi,t = ^ |Yj 5 ( t _i) = Ki, T it i = ti, i, • • • , T i: i = t iiKi , X ijt / 6Z ) 

P(y i)O = 0|X M , 6Z ) . (35) 

Once model parameters are estimated, the prediction procedure is not very dif- 
ferent from the case of a single event. We only note here that it will be compu- 
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tationally challenging to predict all the future events for a new individual at the 
same time. Instead, we focus on the time for the next event conditioned on known 
previous event-times. 



4 Example 

Here, we briefly show an application of our model to a single phenological event - 
blooming of pear trees. 



4.1 Data and objectives 

Representative bloom dates of pear trees in Summerland of British Columbia, 
Canada, between 1937 and 1964 were recorded. In each year, a pear tree blooms 
at most once, and the bloom date is counted as the number of days from the first 
day of a year to a representative bloom date of all the pear trees in the area under 
consideration in that year. Note that the time origin (to) here is set to January I s * 
of each year. Daily maximum and minimum temperatures in the same area in the 
corresponding years are also collected. It is well known in the agricultural sci- 
ence community that the timing of a bloom event is closely related to a quantity 
"AGDD" - the accumulation (cumulative sum) of the so-called growing degree 
days (GDD) defined by 

t 

AGDD (t) = GDD ( k ) > ( 36 ) 

k=to 

where to is the time origin, t is the current time (discrete; on daily scale), and 
GDD is defined as 

( T m j n (k)-\-T max (fc) rp -r T m j n (fc)+Tmqx (&) ^ rp 

GDD(k) = \ 2 " ± base 11--- 2 1 base (37) 

[ otherwise 

where k is discrete time with the unit of day, T m i n (k) and T max (k) are daily mini- 
mum and maximum temperatures, and Tfe ase is a thresholding constant temperature 
which is unknown. Note that (1) AGDD is a function of time; (2) Ti, ase is an un- 
known parameter; (3) AGDD is not a continuous function of Tfe ase . The objective 
of this data analysis is to predict timings of future blooming events and to estimate 

Tbase • 



4.2 Estimation 

Exploratory analysis suggests that the auto-correlation of the bloom dates over 
years are negligible. We may therefore assume that these bloom dates on different 
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years are independent realizations from the same population. We apply the regres- 
sion model for single progressive event described in section 2.3 to the dataset, using 
the logit function as link function. Note that years now play the role of "individ- 
uals". We assume in any given year, that on any day, the probability of blooming 
is only related to AGDD evaluated at the current time, i.e. that the vector t 
contains an intercept and the AGDD value on the current day. This model then 
contains three unknown parameters: the intercept, the coefficient for AGDD eval- 
uated on the current day, and T^ ase . The MLEs of them are: (^intercept = —22.27, 
Pagdd = 0.07 and $T base = 2.97. A question about these estimators is whether 
they are consistent. Wald's (1949) famous sufficient conditions for the consistency 
of MLE requires the likelihood function to be a smooth function of the parameters. 
This is not satisfied in our model because of the presence of Tf, ase , but we were not 
able to address this issue through theoretical analysis. 

Instead we explored the issue of consistency through a simulation study. More 
precisely, we generated 1000 data pairs of bloom dates and daily average temper- 
atures of size 30 years, 80 years, 150 years, and 400 years respectively. We then 
applied our model to these datasets and calculated the MLE of each parameter. For 
each sample size and parameter, we used the average of 1000 values of the MLE to 
estimate the mean of the MLE, and their sample variance to estimate the variance 
of the MLE. We found that estimated means of the MLEs get closer to the true pa- 
rameter values as the sample size increase from 30 to 400. Moreover, the estimated 
variances of the MLEs decreases as the sample size increases. This suggests the 
MLEs are consistent and gave us confidence in the value of the estimators. 

On the other hand, rather than to rely on the validity of asymptotic theory to 
estimate the uncertainties associated with the MLEs, we used bootstrap confidence 
intervals. However, the complexity of our model makes it unclear whether the 
bootstrap estimates of the quantiles of the MLEs converge to the true quantiles. We 
again performed a simulation study to assess that convergence, the details being 
similar to those above and hence omitted for brevity. The results show that the 
lengths of quantile-based 95% bootstrap intervals of the MLEs get very close to 
those of the estimated 95% intervals of the MLEs obtained using the simulated 
data when the sample size increases. The bootstrap intervals are slightly biased 
though (the ends of the bootstrap interval are always slightly smaller or bigger than 
the estimated interval using the simulated data). Overall the results backup use 
of the bootstrap intervals to reflect uncertainties in the MLEs. The quantile based 
95% bootstrap confidence intervals are, for the intercept, (-37.95, -16.62), for the 
coefficient for AGDD (t), (0.055, 0.122), and for T base , (1.93, 3.81). We see that 
the both the intercept and the coefficient for AGDD (t) differ significantly from 
at the 5% level. 
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4.3 Prediction 



To use (23) or (24) to predict the representative bloom date of the pear trees of 
the next year in Summerland, we need to predict the daily average temperature 
{{Tmin (i) + T max {t)) /2) of that year first. After removing seasonality in daily 
historical temperature data, We fit an ARIMA model to the residual historical 
temperature. Note the deseasonalized historical temperature series contain weak 
periodic signals with longer periods, and therefore is not strictly stationary. Never- 
theless, ARIMA may still be used as a a reasonable approximation. By comparing 
Bayesian information criterion (BIC) for ARIMA models of different orders, we 
settled on ARIMA (3, 0, 1) as our final model for generating future temperatures 
in any given growing season. 

We now turn to prediction. At the end of the current year, we generate 1000 
series of the daily average temperatures of the whole next year using the fitted 
ARIMA (3, 0, 1) model. We then use (24) to calculate the probability of the 
blooming event happening on each successive day of the following year. This way 
we get a (discrete) predictive distribution for the timing of that blooming event. 

So suppose that we are on the end of the first day of the new year with its 
observed average daily temperature. We then apply the ARIMA model to gen- 
erate 1000 temperature series starting from the second day of the new year. As 
above, we can use (24) to get another predictive distribution for the timing of the 
blooming event. We repeat this procedure on each successive day, until the true 
bloom date, at which time prediction ceases. If the true bloom date is around day 
129, we then get 129 successive predictive distributions. What we expect to see 
are increasingly more accurate predictions as the days progress toward the bloom 
date and more and more information about the daily averages temperatures come 
to hand for that season. Growing confidence in that prediction would provide an 
increasingly strong basis for management decisions. 

To see if our expectations are realized, we performed a leave-one-out predic- 
tion procedure - at every step, leave out one year of data for assessment and use 
the remaining years for training the model to predict the bloom date in the left-out 
year. For each left-out year, we follow the prediction scenario described above. 
As a result we get 28 years (1937-1964) of assessments, with a total of 3523 pre- 
dictive distributions, the average of the bloom dates for those years being about 
day 126. For each of these predictions, we calculate the median of the predictive 
distribution as a point prediction of the new bloom date. Along with that we calcu- 
late a quantile-based 95% prediction interval (PI) for the new bloom date. With all 
the 3523 predictive distributions, we then can estimate the root mean square error 
(RMSE) and mean absolute error (MAE) of the prediction, as well as the coverage 
probability of the 95% PI. The results are as follows: the RMSE is 5.65 days; the 
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MAE is 4.36 days; the estimated coverage probability of the 95% PI is 99%; the 
average length of the Pis is 30 days. 

The coverage probability of the 95% PI is too high, plausibly because in the 
ARIMA(3, 0, 1) model we have incorporated in the random noise term, the vari- 
ability in the temperature series caused by deterministic periodic signals other than 
seasonal variation. In any case, reducing the variance of the white noise in the 
ARIMA(3, 0, 1) model by half of its estimated value yields improvement and 
we get: the RMSE is 5.79 days, the MAE is 4.33 days, the estimated coverage 
probability is about 94%, and the average length of the Pis is 21 days. 

As noted above, we expected the prediction to become more accurate as time 
approaches the real bloom date. To check this, we calculated the MAE and the 
average length of the 95% Pis each day over the years of interest, beginning 90 
days prior to the bloom date (call it "lag -90") to 1 day prior to the bloom date 
("lag -1"). The results for the MAE and the average length of the 95% Pis are 
shown in Figure 1 and 2 respectively. We see that the MAE does become smaller 
and the average length of the 95% Pis, shorter as the actual bloom date approaches 
in line with our expectations. In fact, by the time we reach one month prior to the 
bloom date, the prediction has become quite accurate (the MAE is about 3.5 days). 

The above results for prediction are influenced by two models: one is our re- 
gression model for a single progressive event, the other is a crude ARIMA model 
for daily average temperature. To check the pure performance of our regression 
model, we performed the leave-one-out procedure again. But this time, we assume 
all the future daily average temperatures are known. Note that, in this case, we 
cannot give a sensible estimate for the coverage probability of the 95% Pis since 
for each test year, we can only get one predictive distribution. The results are very 
good: the RMSE is 2.64 days, the MAE is 1.89 days, and the average length of 
the 95% Pis is 9.21 days. Although this is no longer a real prediction, these re- 
sults tend to validate our regression model for the blooming event. This finding 
also demonstrates the importance of modeling the covariate series accurately and 
points to the need of improving the temperature forecasting models. 

5 Concluding Remarks 

The regression models presented in this paper aim at the prediction of the times of 
progressive events when time-dependent covariates that are known up to discrete 
time points are present. Instead of directly modeling the hazard function, we model 
the process of the binary state indicators. This way, all the time-dependent infor- 
mation can be easily incorporated by considering a model for a binary variable at 
each time point. When there is only a single event, the process of the state indica- 
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Figure 1 : Change of the MAE with the change of lag. The point prediction becomes 
more accurate when time approaches the bloom date. 



tors is a Markov chain. But when there are multiple events, that process does not 
necessarily have a Markovian structure. In this case, some additional assumptions 
are needed for simplifying the probability model and circumventing computation 
challenges that would otherwise arise. Application of our approach to bloom date 
data has shown that the prediction using it can be quite accurate. Although orig- 
inally designed for phenological data, these models should be useful for a broad 
range of survival data. 

A restrictive distributional assumption in our models is that the process of the 
state indicator needs to be time-homogeneous. One way to relax this assumption 
might be to allow the model parameters to change with time. Another restrictive 
assumption is that in the multiple events case, we require that no two events can 
happen at the same time point. However, in practice, this may occur, especially 
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Figure 2: Change of the average length of 95% Pis with the change of lag. The 
predictive uncertainty decreases when time approaches the bloom date. 



when the discrete time scale is coarse. We will need some further work to remove 
this restriction. 
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